refactor(cardano): shard, reorder, and merge the EWRAP boundary pipeline by scarmuega · Pull Request #978 · txpipe/dolos

scarmuega · 2026-04-25T14:23:49Z

Summary

Restructures the EWRAP epoch-boundary pipeline across six commits:

Shard EWRAP into prepare/shard/finalize work units (948d472e, pre-existing) — initial split of the monolithic EWRAP into three phases.
Rename EwrapPrepare → Ewrap, seed EpochState.end in ESTART (9627c3ae) — drop the misleading "Prepare" name; ESTART's EpochTransition now opens the end slot.
Rename EwrapShard → AccountShard (ae260792) — name the per-account work unit by what it processes, not its position in the pipeline.
Reorder: AccountShard runs before Ewrap (eaf8e08c) — settle per-account effects first, then run the global Ewrap visitors. EpochEndInit becomes a PATCH delta; EpochEndAccumulate accepts ewrap_progress = None as the natural shard-0 starting state.
Merge EwrapFinalize into Ewrap; drop EpochEndInit (a1e0d5cf) — collapse the redundant finalize phase into Ewrap; the wrap-up visitor now assembles the full EndStats (prepare-time fields + shard accumulators) and emits a single EpochWrapUp. One fewer state-machine state, one fewer delta type, one fewer commit cycle. Boundary close is now atomic in a single state-writer commit.
Seed EpochState.end in Genesis (5122e075) — restores the invariant that end = Some(...) whenever an AccountShard runs (the first boundary's shards otherwise hit a fresh genesis state with end = None).

Final pipeline: Estart → Roll … → Rupd → Roll … → AccountShard ×N → Ewrap (where Ewrap now both runs the global visitors and closes the boundary).

Test plan

cargo check -p dolos-cardano clean
cargo test -p dolos-cardano — all 98 unit tests pass (work-buffer state-machine, epoch delta proptests, etc.)
cargo build --tests — full workspace compiles
cargo test --test memory passes
cargo test --test epoch_pots --release — gold-standard end-to-end against DBSync ground truth (requires populated Mithril test instance)
Manual bootstrap from Mithril to verify the genesis seeding fix in 5122e075 resolved the original panic

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added an AShard work unit to run per-shard, per-account reward application.
- Added account_shards configuration (defaults to 16) to control shard parallelism.
Improvements
- Restructured epoch-boundary pipeline: rewards are applied per-shard and final wrap-up runs once per boundary for more predictable processing and crash recovery.
- Reduced peak memory during key-range iteration.
Tests & Docs
- Added memory-bounded iteration tests and updated pipeline documentation.

Partitions the epoch-boundary EWRAP work unit — previously one monolithic work unit that materialised O(active_accounts) in memory (rewards map, deltas, logs, applied_rewards) — into three phase-specific work units that each commit independently: * EwrapPrepare: global classification (pools/dreps/proposals), MIRs, enactment + refund visitors for non-account entities, emits EpochEndInit seeding EpochState.end with the prepare-time globals and zeroed reward accumulators. * EwrapShard(i): range-scoped (first-byte prefix bucket) load of pending rewards + accounts, runs rewards + drops visitors per account, emits EpochEndAccumulate with the shard's reward contribution. * EwrapFinalize: reads the accumulated EpochState.end, emits EpochWrapUp (which transitions rolling/pparams snapshots and clears ewrap_progress). Cross-shard handoff piggy-backs on EpochState rather than a new entity: ewrap_progress: Option<u32> is the durable cursor and EpochState.end accumulates across shards via the new deltas. WorkBuffer gains EwrapShardingBoundary{shard_index, total_shards} and EwrapFinaliseBoundary states; pop_work now takes ewrap_total_shards from CardanoConfig (default 16). EpochEndAccumulate has an idempotency guard keyed on ewrap_progress so shard re-execution after a crash is safe. Detection-only crash recovery at initialize time logs a warning when ewrap_progress is set; full block-rehydration resume is flagged as TODO. Memory tests in tests/memory.rs verify both fjall and redb3 honour range-scoped iter_entities with O(1) heap — the load-bearing property for the shard design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n ESTART Decouple two responsibilities that were tangled in EwrapPrepareWorkUnit: the global epoch-boundary entity processing (now plain `Ewrap`) and the structural opening of the `EpochState.end` slot (now done by ESTART's `EpochTransition`). Ewrap's `EpochEndInit` delta keeps its overwrite semantics; it now writes into a default-seeded slot rather than from None. Also adds `prev_end` / `prev_ewrap_progress` undo fields to `EpochTransition` (serialized, like the other prev_* fields) so a rollback after restart correctly restores them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The per-account leg of the epoch close was named after its position in the EWRAP pipeline; AccountShard names what it actually does — apply rewards and pool/drep delegator drops over a key-range slice of the account namespace. Also renames the related symbols (BoundaryWork::load_shard / commit_shard → load_account_shard / commit_account_shard, WorkBuffer::EwrapShardingBoundary → AccountShardingBoundary, InternalWorkUnit::EwrapShard → AccountShard). The user-facing `ewrap_total_shards` config field is intentionally preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…line The epoch-boundary sequence is now AccountShard ×N → Ewrap → EwrapFinalize (was Ewrap → AccountShard ×N → EwrapFinalize). Per-account work settles first; the global Ewrap phase then patches the prepare-time fields onto an EpochState.end that already has its reward accumulators populated. State machine: WorkBuffer::pop_work transitions reordered, and on_ewrap_boundary now takes ewrap_total_shards so the restart-at-boundary entry can construct AccountShardingBoundary directly. The total_shards == 0 defensive branch now skips to EwrapBoundary (global phase) instead of EwrapFinaliseBoundary. Delta semantics: - EpochEndInit::apply is now a PATCH — writes only the prepare-time fields (pool counts, epoch_incentives, MIR amounts, proposal refunds) and leaves the accumulator fields alone. ewrap_progress is no longer touched by this delta. Dropped the unused prev_ewrap_progress field. - EpochEndAccumulate::apply treats ewrap_progress = None as the natural starting state for shard 0 (unwrap_or(0) as the expected cursor). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…delta The boundary close is now a single Ewrap work unit: it runs the global visitors AND emits EpochWrapUp carrying the assembled final EndStats (prepare-time fields combined with the AccountShard-populated accumulator fields). The wrap-up visitor now constructs the final stats locally instead of routing them through a separate EpochEndInit delta. Side-benefits: one fewer state-machine state, one fewer delta type, one fewer commit cycle. Atomicity also improves — the boundary close is now a single state-writer commit, so a crash between Ewrap and EwrapFinalize is no longer possible. Test fixture in tests/epoch_pots/main.rs restructured to match the post-reorder pipeline: accumulator reset gates on AccountShard shard_index == 0; rewards CSV is dumped on the Ewrap arm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After the boundary-pipeline reorder (AccountShard runs before Ewrap), the first epoch's AccountShard hits `EpochEndAccumulate::apply` with `entity.end == None` because Genesis bootstrapped the EpochState before ESTART's `EpochTransition` had a chance to seed the slot. Seed `end = Some(EndStats::default())` directly in Genesis to match the invariant ESTART maintains for every subsequent epoch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-25T14:23:57Z

📝 Walkthrough

Walkthrough

Adds an AShard work-unit phase that shards per-account epoch-boundary reward application by entity-key range, snapshots per-shard accumulators into epoch state, and refactors Ewrap to handle global MIR/refund/finalization after all shards commit.

Changes

Cohort / File(s)	Summary
AShard Core `crates/cardano/src/ashard/mod.rs`, `crates/cardano/src/ashard/work_unit.rs`, `crates/cardano/src/ashard/loading.rs`, `crates/cardano/src/ashard/commit.rs`	New AShard work-unit: load_ashard applies per-account pending rewards within a key-range, emits per-shard EpochEndAccumulate, and commit_ashard writes drained logs, updates shard accumulators, and commits state/archive.
Shard Helpers & Rewards `crates/cardano/src/ashard/shard.rs`, `crates/cardano/src/ashard/rewards.rs`	Adds shard range partitioning (first-byte prefix, validation) and adapts rewards visitor to use `crate::ewrap` types and shard-scoped reward retention.
Ewrap Refactor `crates/cardano/src/ewrap/loading.rs`, `crates/cardano/src/ewrap/commit.rs`, `crates/cardano/src/ewrap/work_unit.rs`	Replaces load/compute/commit flow with `load_ewrap`/`compute_ewrap_deltas`/`commit_ewrap`; streaming helper gains optional range filter; Ewrap now handles MIR and boundary closure (no per-account reward application).
Ewrap State & Wrap-up `crates/cardano/src/ewrap/mod.rs`, `crates/cardano/src/ewrap/wrapup.rs`	Exposes `ending_state` (`pub(crate)`), adds shard accumulator fields to BoundaryWork, and changes EndStats derivation to use epoch-state accumulators saved into `EpochState.end`.
Epoch State & Deltas `crates/cardano/src/model/epochs.rs`, `crates/cardano/src/model/mod.rs`	Adds `ashard_progress` to `EpochState`, `AShardProgress` snapshot type, new `EpochEndAccumulate` delta (apply/undo with idempotency guards), and wires delta into CardanoDelta enum and epoch transition logic.
Work Pipeline & Buffering `crates/cardano/src/work.rs`, `crates/cardano/src/lib.rs`	Introduces `AShardingBoundary` and `InternalWorkUnit::AShard`; wires `AShardWorkUnit` into `CardanoWorkUnit` and work-materialization; `receive_block`/`pop_work` now accept `account_shards` and drive per-shard emission.
Config & Genesis `crates/core/src/config.rs`, `crates/cardano/src/genesis/mod.rs`	Adds `CardanoConfig.account_shards` with default 16 and accessor; genesis bootstraps `EpochState.end = Some(EndStats::default())` and sets initial `ashard_progress = None`.
Ewrap/Bdry Reorg & Visitors `crates/cardano/src/ewrap/mod.rs`, `crates/cardano/src/ewrap/rewards.rs`	Moves per-account applied-rewards concept to AShard; removes `rewards` export from ewrap and adjusts visitor signatures/usages.
Docs & Tests `crates/cardano/work_units.md`, `crates/cardano/src/model/pending.rs`, `crates/cardano/src/rupd/work_unit.rs`, `skills/debug-epoch-mismatch/SKILL.md`, `tests/epoch_pots/main.rs`, `tests/memory.rs`	Adds work-units doc; updates docstrings to reflect AShard/Ewrap responsibilities; test changes: accumulate applied_rewards across AShard units for CSV dump and add memory-boundedness tests for range iteration.
Misc: Error Type `crates/core/src/lib.rs`	Adds `ChainError::InvalidConfig(String)` variant for config validation errors.

Sequence Diagram(s)

sequenceDiagram
    participant Boundary as WorkBuffer/Boundary
    participant AShard as AShardWorkUnit
    participant BWLoad as BoundaryWork::load_ashard
    participant BWCommit as BoundaryWork::commit_ashard
    participant Epoch as EpochState
    participant Ewrap as EwrapWorkUnit

    Boundary->>AShard: emit AShard(slot, shard_index)
    AShard->>BWLoad: load_ashard(state, genesis, shard_index, total_shards, range)
    BWLoad->>BWLoad: load pending rewards for range\niterate accounts, apply rewards+drops
    BWLoad->>Epoch: snapshot shard accumulators\nemit EpochEndAccumulate delta
    AShard->>BWCommit: commit_ashard(state, archive, range)
    BWCommit->>Epoch: write drained logs, update ashard_progress
    BWCommit->>Boundary: commit writers
    Boundary->>Ewrap: after all shards -> emit Ewrap
    Ewrap->>Ewrap: load_ewrap(), apply MIR/refunds, finalize EndStats

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

fix(cardano): use entity streaming for mem-heavy work units #846: Overlaps with boundary commit streaming changes and commit_ewrap/stream helper refactor in ewrap commit path.
fix: avoid excessive mem increase during boundary #905: Implements lazy, memory-bounded entity iteration and adds stats_alloc memory tests used by the new per-shard range iteration tests.
fix(cardano): apply mainnet-specific historical edge-cases #895: Touches the same boundary/ewrap modules (commit/loading/wrapup) and reward/pending state handling; closely related at code-level.

Poem

🐰 Hop, I sorted shards so neat,

Keys in ranges, memory sweet.
Rewards applied in bounded stride,
Epochs gather what each did provide.
AShard hops — then Ewrap's pride. 🌿

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: refactoring the EWRAP boundary pipeline by introducing sharding, reordering, and merging phases.
Docstring Coverage	✅ Passed	Docstring coverage is 80.46% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/shard-ewrap-work-units

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Pulls the AccountShard work unit out of `ewrap/` into a peer module `ashard/` (matching the layout of `estart/`, `rupd/`, `roll/`, `genesis/`). The shared `BoundaryWork` / `BoundaryVisitor` infrastructure and the drops visitor (used by both phases) stay in `ewrap/`; `ashard/` imports them. Moves: `rewards.rs`, `shard.rs`, `AccountShardWorkUnit` (from `work_unit.rs`), and the `load_*` / `commit_*` impl blocks. Visibility on shared `BoundaryWork` helpers (`new_empty`, `load_pool_data`, `load_drep_data`, `stream_and_apply_namespace`) widened from private to `pub(crate)`. The `ending_state` field also widened to `pub(crate)` so peer modules can mutate it (e.g. `wrapup.flush` already does this). Method/identifier renames to match the new module path: - `BoundaryWork::load_account_shard` → `load_ashard` - `BoundaryWork::commit_account_shard` → `commit_ashard` - `WorkUnit::name()` returns `"ashard"` Type name `AccountShardWorkUnit` is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mments Sweeps the docstrings/comments touched in this PR for references to phases, work units, and deltas that no longer exist after the rename / reorder / merge / split sequence: - Restore the in-place explanation for the "rewards before drops" HACK in `ashard/loading.rs` (the dangling "see comment on the pre-shard path" pointed to a comment that was deleted when the prepare phase was removed). - Drop "prepare phase" / "finalize phase" wording from `BoundaryWork` field docstrings, `commit_ewrap` comments, and `loading.rs` section dividers — neither phase exists; there's only Ewrap (global + close) and AccountShard (per-account). - Update the ESTART `EpochTransition` description in `work_units.md` so it reflects the post-merge data flow: AccountShards populate the accumulators directly, then Ewrap reads them back and emits `EpochWrapUp` with the final `EndStats` (no `EpochEndInit` patch step anymore). - Rename `compute_prepare_deltas` → `compute_ewrap_deltas`. The "prepare" name was a leftover from the `EwrapPrepare` work unit; the method is now the only Ewrap-phase compute helper. - Tighten `load_pending_rewards_range` docstring; flag that the `None` branch is currently unused. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The shard-related identifiers and comments were named after the legacy EWRAP pipeline that bundled the global epoch-boundary work and the per-account shards together. With AccountShard now a distinct work unit in its own module, those names are misleading. Rename to use the `ashard` prefix consistently with the module path: - `CardanoConfig::ewrap_total_shards` → `ashard_total` - `CardanoConfig::DEFAULT_EWRAP_TOTAL_SHARDS` → `DEFAULT_ASHARD_TOTAL` - `EpochState::ewrap_progress` → `ashard_progress` - `prev_ewrap_progress` → `prev_ashard_progress` on `EpochEndAccumulate`, `EpochWrapUp`, and `EpochTransition` - `WorkBuffer::receive_block` / `on_ewrap_boundary` / `pop_work` parameter `ewrap_total_shards` → `ashard_total` - Error messages in `ashard/shard.rs` updated to match. Also fixes comment / doc misattributions where "EWRAP" was used for work that's now in `AccountShard`: - `PendingRewardState` / `DequeueReward` are consumed by `AccountShard`, not Ewrap. - `PendingMirState` / `DequeueMir` are consumed by Ewrap (clarified). - `AppliedReward` and the `applied_rewards` field are populated during AccountShard, not Ewrap. - RUPD's docstring now says rewards are consumed by `AccountShard`. - Crash-recovery wording in `lib.rs` says "mid-boundary" instead of "mid-EWRAP" since the cursor specifically tracks AccountShard progress. BREAKING CONFIG CHANGE: existing `dolos.toml` files that explicitly set `ewrap_total_shards` need to rename the key to `ashard_total`. Users relying on the default (omitted) are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Aligns the type and variant names with the module path convention: - struct `AccountShardWorkUnit` → `AShardWorkUnit` - enum variant `CardanoWorkUnit::AccountShard` → `AShard` - enum variant `InternalWorkUnit::AccountShard` → `AShard` - WorkBuffer state `AccountShardingBoundary` → `AShardingBoundary` - module re-export and all callers updated to match - prose / docstrings / log messages also use `AShard` consistently The module path is `crate::ashard`, so the type now reads as `ashard::AShardWorkUnit`. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per review feedback: the user-facing config name should be self-explanatory in `dolos.toml`. Renames everywhere for consistency: - `CardanoConfig::ashard_total` field → `account_shards` - `CardanoConfig::ashard_total()` accessor → `account_shards()` - `CardanoConfig::DEFAULT_ASHARD_TOTAL` → `DEFAULT_ACCOUNT_SHARDS` - WorkBuffer parameters and error messages updated to match. BREAKING CONFIG CHANGE: existing `dolos.toml` files that explicitly set this option (under any prior name from this PR) need to use `account_shards`. Users relying on the default are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Guards against a config change to `account_shards` corrupting an in-flight boundary. Previously, if dolos crashed mid-boundary and the operator changed `account_shards` between crash and restart, the resume would re-partition the account key space with the new count, mismatching the cursor's already-committed shards. Fix: snapshot the boundary's shard count into state at the first `EpochEndAccumulate` apply. The persisted total is authoritative for the duration of the in-flight boundary; the new config value only takes effect on the next boundary. Changes: - New `AShardProgress { committed, total }` struct stored at `EpochState.ashard_progress: Option<AShardProgress>` (was `Option<u32>`). - `EpochEndAccumulate` carries `total_shards`. Its apply validates the delta's `total_shards` matches any previously persisted total and surfaces an error if they diverge (would only happen if a work unit was constructed with a stale config view). - `EpochWrapUp` and `EpochTransition` undo fields adapted to the new type. - `AShardWorkUnit::load` / `commit_state` read the persisted total when present and fall back to `config.account_shards()` for fresh boundaries. - `CardanoLogic` caches `effective_account_shards` (= persisted total if a boundary is in flight, else config). Refreshed at every `pop_work` call so `receive_block` (which has no state access) can use the up-to-date value when constructing `WorkBuffer::AShardingBoundary`. - Crash-recovery wording updated to surface a clear warning when the persisted total disagrees with current config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

crates/cardano/src/ashard/rewards.rs (1)
114-188: ⚠️ Potential issue | 🟡 Minor

Log message clarity: add shard context to per-shard telemetry.

The flush method (lines 114–188) now runs once per AShard, but the telemetry messages ("rewards remaining before drain", "SPENDABLE REWARDS LEFT UNPROCESSED", etc.) don't indicate shard scope. Per-shard loading correctly isolates each shard's RewardMap via load_pending_rewards_range(state, Some(range)), so the drain and error detection are functionally sound. However, for operational visibility, these log statements should include shard index and total_shards to clarify they reflect per-shard metrics, not boundary-wide state. This is especially helpful for debugging when multiple shards report metrics in parallel.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/ashard/rewards.rs` around lines 114 - 188, The per-shard
logs in flush don't indicate which shard they belong to; update the flush
logging to include shard context (shard_index and total_shards). Add
shard_index: usize and total_shards: usize as fields on the AShard struct (or
otherwise make them available on self or ctx), ensure any AShard
construction/population passes these values, and then include %shard_index and
%total_shards in the tracing::debug! and tracing::error! invocations (and the
debug! at the end) around ctx.rewards/ drain_unspendable/
applied_reward_credentials so all per-shard telemetry shows shard scope.
crates/cardano/src/model/epochs.rs (1)
980-981: ⚠️ Potential issue | 🟡 Minor

Fix the unused_doc_comments rustdoc warning flagged by CI.

Pipeline reports rustdoc does not generate documentation for macro invocations on these lines. The /// comment immediately above the prop_compose! macro invocation is dropped on the floor by rustdoc and triggers the lint. Convert to a regular // comment (or move it inside the closure body if you want it preserved).
🛠️ Proposed fix
-    /// `EpochStatsUpdate::apply` calls `rolling.live_mut` which asserts `next` is None,
-    /// so we need a specialized generator that keeps `rolling.next` empty.
+    // `EpochStatsUpdate::apply` calls `rolling.live_mut` which asserts `next` is None,
+    // so we need a specialized generator that keeps `rolling.next` empty.
     prop_compose! {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/model/epochs.rs` around lines 980 - 981, The rustdoc
warning comes from using a /// doc comment immediately above the prop_compose!
macro (related to EpochStatsUpdate::apply and rolling.live_mut which asserts
rolling.next is None), so replace that leading /// comment with a non-doc
comment (//) or move the explanation into the generator body/closure so rustdoc
won't try to document the macro invocation; specifically update the comment
above the prop_compose! invocation that references rolling.live_mut and
rolling.next to use // (or relocate it inside the prop_compose! closure) to
silence the unused_doc_comments lint.

🧹 Nitpick comments (12)

crates/cardano/src/ashard/rewards.rs (1)
56-77: Stale EWRAP reference in comment.

The comment at lines 59–61 still says "registered at RUPD startStep time but deregistered before EWRAP". After this refactor the per-account registration check happens during AShard, not Ewrap. Suggest updating to "deregistered before AShard" (or "before the boundary's per-account leg") to keep the comment in lockstep with the renamed pipeline.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/ashard/rewards.rs` around lines 56 - 77, Update the stale
comment referencing "EWRAP" to reflect the new pipeline stage: change the text
around the account deregistration explanation to say "deregistered before
AShard" (or "before the boundary's per-account leg") so it matches where the
per-account registration check now runs; the relevant area to edit is the
comment above the account.is_registered() branch in the function that calls
reward.total_value() and ctx.rewards.return_reward_to_treasury(total). Ensure
the updated comment keeps the existing clarification about pre-filtered accounts
and returned_rewards behavior.
crates/cardano/src/ashard/loading.rs (1)
19-57: Optional: drop the Option<Range<EntityKey>> wrapper.

The doc-comment already states load_ashard is the only caller and always passes Some(range). Making the parameter Range<EntityKey> directly removes a never-taken branch and tightens the contract. If a future "load all" caller appears, reintroducing the option is trivial.
♻️ Suggested refactor
-    fn load_pending_rewards_range<D: Domain>(
-        &mut self,
-        state: &D::State,
-        range: Option<Range<EntityKey>>,
-    ) -> Result<(), ChainError> {
-        let pending_iter = state
-            .iter_entities_typed::<PendingRewardState>(PendingRewardState::NS, range)?;
+    fn load_pending_rewards_range<D: Domain>(
+        &mut self,
+        state: &D::State,
+        range: Range<EntityKey>,
+    ) -> Result<(), ChainError> {
+        let pending_iter = state
+            .iter_entities_typed::<PendingRewardState>(PendingRewardState::NS, Some(range))?;
And drop the Some(range.clone()) at the call site (line 81) to pass range.clone() directly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/ashard/loading.rs` around lines 19 - 57, The function
load_pending_rewards_range currently accepts Option<Range<EntityKey>> but the
only caller (load_ashard) always passes Some(range), so tighten the signature to
take Range<EntityKey> directly: change the parameter type in
load_pending_rewards_range and update callers (e.g., in load_ashard remove
Some(range.clone()) and pass range.clone() directly); ensure the body drops any
pattern-matching for None (no-op) and uses the range value when calling
state.iter_entities_typed::<PendingRewardState>(PendingRewardState::NS, range).
tests/epoch_pots/main.rs (1)
500-513: Optional: collapse the two if let Some(boundary) checks.

The shard-index==0 reset and the per-shard extend can share a single boundary binding, which trims a branch and makes the relationship between "capture ending epoch" and "extend rewards" obvious.
♻️ Suggested refactor
                 CardanoWorkUnit::AShard(shard) => {
-                    if shard.shard_index() == 0 {
-                        // First shard of this boundary — reset accumulator
-                        // and capture the ending epoch.
-                        accumulated_applied.clear();
-                        if let Some(boundary) = shard.boundary() {
-                            accumulated_ending_epoch =
-                                Some(boundary.ending_state().number);
-                        }
-                    }
-                    if let Some(boundary) = shard.boundary() {
+                    if let Some(boundary) = shard.boundary() {
+                        if shard.shard_index() == 0 {
+                            // First shard of this boundary — reset
+                            // accumulator and capture the ending epoch.
+                            accumulated_applied.clear();
+                            accumulated_ending_epoch =
+                                Some(boundary.ending_state().number);
+                        }
                         accumulated_applied.extend(boundary.applied_rewards.iter().cloned());
                     }
                 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/epoch_pots/main.rs` around lines 500 - 513, Collapse the two separate
`if let Some(boundary)` checks into a single binding for `boundary` when
matching `CardanoWorkUnit::AShard(shard)`: inside one `if let Some(boundary) =
shard.boundary()` block first check `if shard.shard_index() == 0` to clear
`accumulated_applied` and set `accumulated_ending_epoch =
Some(boundary.ending_state().number)`, then unconditionally call
`accumulated_applied.extend(boundary.applied_rewards.iter().cloned())`; this
removes the duplicated boundary lookup and makes the reset-and-extend logic
contiguous.
crates/cardano/work_units.md (1)
5-11: Add a language to the fenced block (MD040).

markdownlint-cli2 flagged this fence as missing a language. Use text (or any non-rendered identifier) so the diagram passes lint.
📝 Suggested fix
-```
+```text
 Estart  →  Roll …  →  Rupd  →  Roll …  →  AShard ×N  →  Ewrap
 (open)     (blocks)   (RUPD)   (blocks)    (per-account)         (global + close)
                                                                           │
                                                                           ▼
                                                                   next epoch's Estart
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @crates/cardano/work_units.md around lines 5 - 11, The fenced code block
containing the epoch diagram starting with "Estart → Roll … → Rupd → Roll
… → AShard ×N → Ewrap" is missing a language tag (MD040); fix it by adding a
language identifier (use "text") immediately after the opening triple backticks
so the block becomes ```text and the diagram passes markdownlint-cli2
validation.
</details>

</blockquote></details>
<details>
<summary>crates/cardano/src/lib.rs (2)</summary><blockquote>

`284-341`: **Consolidate duplicated `effective_account_shards` lookup logic.**

The inline computation at lines 338–341 is identical to the new helper `read_effective_account_shards` (lines 253–258). On top of that, `load_epoch::<D>(state)` is invoked twice within `initialize` (line 292 for the warnings, line 338 to derive `effective_account_shards`). A small refactor avoids the double read and keeps the “persisted total else config” rule in one place.



<details>
<summary>♻️ Proposed refactor</summary>

Lift the helper to a free fn and reuse it (and reuse the loaded `EpochState` for the warnings):

```diff
-    /// Compute the effective `account_shards` value: stored
-    /// `ashard_progress.total` if a boundary is in flight, otherwise the
-    /// configured value.
-    fn read_effective_account_shards<D: Domain>(&self, state: &D::State) -> u32 {
-        load_epoch::<D>(state)
-            .ok()
-            .and_then(|e| e.ashard_progress.as_ref().map(|p| p.total))
-            .unwrap_or_else(|| self.config.account_shards())
-    }
+    /// Compute the effective `account_shards` value: stored
+    /// `ashard_progress.total` if a boundary is in flight, otherwise the
+    /// configured value.
+    fn read_effective_account_shards<D: Domain>(&self, state: &D::State) -> u32 {
+        effective_account_shards_from_state::<D>(state, &self.config)
+    }
+}
+
+fn effective_account_shards_from_state<D: Domain>(state: &D::State, config: &CardanoConfig) -> u32 {
+    load_epoch::<D>(state)
+        .ok()
+        .and_then(|e| e.ashard_progress.as_ref().map(|p| p.total))
+        .unwrap_or_else(|| config.account_shards())
 }
```

And in `initialize`, compute once and reuse the loaded epoch for both the warnings and the cached value.
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/lib.rs` around lines 284 - 341, Extract and reuse the
epoch load and the effective-account-shards logic: in initialize(), call
load_epoch::<D>(state) once and store the resulting EpochState (if Ok) for the
warnings; replace the inline computation of effective_account_shards with a call
to the existing helper read_effective_account_shards (or lift that helper to a
free fn if necessary) so the "persisted total else config" rule is implemented
in one place; update references to use the single loaded epoch value for the
tracing::warn! blocks and derive effective_account_shards from
read_effective_account_shards(state, &config) (or equivalent) to remove the
duplicate load_epoch::<D>(state) call.
```

</details>

---

`404-408`: **Per-`pop_work` state read may be unnecessary outside boundaries.**

`effective_account_shards` is refreshed on every `pop_work` call (each block cycle) via a `read_entity_typed` lookup, but it only matters at epoch boundaries. The cost is small (single point read), but you could skip the read when the cached value is non-stale by only refreshing when `ashard_progress` could have changed (i.e., before an actual boundary handoff). Worth considering if `pop_work` shows up in profiling.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/lib.rs` around lines 404 - 408, The code refreshes
self.effective_account_shards on every pop_work by calling
read_effective_account_shards::<D>(domain.state()), but this value only matters
when the epoch boundary handoff (EpochState.ashard_progress.total) can change;
avoid the repeated read by first checking whether ashard_progress has changed
and only calling read_effective_account_shards when it has. Add a small cached
marker on the struct (e.g., last_ashard_total or last_ashard_progress) and in
pop_work compare domain.state().ashard_progress.total (or the appropriate
accessor) to that cached value; if different, call read_effective_account_shards
and update both effective_account_shards and the cached marker, otherwise skip
the read and keep the existing effective_account_shards.
```

</details>

</blockquote></details>
<details>
<summary>crates/cardano/src/ewrap/commit.rs (1)</summary><blockquote>

`91-100`: **Full-table account scan to apply a handful of MIR rewards.**

`stream_and_apply_namespace::<D, AccountState>(state, &writer, None)` walks every account in state to surface the small set with MIR-queued deltas. The inline comment acknowledges this is "effectively a targeted write via the streaming path", but on mainnet-sized account tables this is a significant cost paid every boundary, regardless of how few MIRs are pending (often zero).

Consider a direct path: iterate `self.applied_mir_credentials` (or the credentials of the queued `AssignRewards` deltas), point-read each `AccountState`, apply, and write. The streaming helper can stay for the full-set namespaces (Pool/DRep/Proposal/EpochState), but accounts during Ewrap are inherently sparse.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/ewrap/commit.rs` around lines 91 - 100, The current call
to stream_and_apply_namespace::<D, AccountState>(state, &writer, None) performs
a full-table scan of AccountState each boundary; instead implement a targeted
path that iterates over self.applied_mir_credentials (or the credentials from
queued AssignRewards deltas), point-reads each AccountState, applies the MIR
delta, and writes the result via the same writer so only affected accounts are
touched; keep stream_and_apply_namespace for full-set namespaces like
Pool/DRep/Proposal/EpochState, and ensure the new code reuses the existing
apply/serialize logic used by stream_and_apply_namespace to maintain invariants
and error handling.
```

</details>

</blockquote></details>
<details>
<summary>crates/cardano/src/ashard/work_unit.rs (1)</summary><blockquote>

`62-129`: **Avoid recomputing `total_shards`/`range` from state in `commit_state`.**

`load` computes `total_shards` and `range` from persisted state, then calls `BoundaryWork::load_ashard(..., shard_index, total_shards, range)` — so the boundary already owns these values. `commit_state` repeats the same state read and recomputation, with two minor concerns:

1. **Code duplication**: identical 8-line block in both methods (and a third copy in `crates/cardano/src/lib.rs`).
2. **Subtle drift risk**: if `commit_state` ever observed a different `ashard_progress.total` than `load` did (e.g., a shard's apply landing between the two phases of the same work unit), the recomputed `range` would diverge from what was loaded. Today this can't happen, but cross-phase reuse of the captured values is more obviously correct.

Consider exposing the captured shard info from `BoundaryWork` (or storing the computed `range` in `AShardWorkUnit` after `load`) and have `commit_state` reuse it instead of re-deriving.



<details>
<summary>♻️ Sketch</summary>

```diff
 pub struct AShardWorkUnit {
     slot: BlockSlot,
     config: CardanoConfig,
     genesis: Arc<Genesis>,
     shard_index: u32,
-    boundary: Option<BoundaryWork>,
+    boundary: Option<BoundaryWork>,
+    /// Captured during `load` so `commit_state` reuses the exact same range.
+    range: Option<std::ops::Range<dolos_core::EntityKey>>,
 }
```

Then `load` stores `self.range = Some(range.clone())` and `commit_state` consumes it.
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/ashard/work_unit.rs` around lines 62 - 129, The
load/commit_state duplication and drift risk can be fixed by capturing the
computed total_shards and range during load and reusing them in commit_state
instead of re-reading state: after computing total_shards and range in load
(before calling BoundaryWork::load_ashard), store the values on the work unit
(e.g., add fields self.total_shards and/or self.range and assign them there),
and then change commit_state to take the boundary via self.boundary.as_mut() and
use the stored range/total_shards when calling boundary.commit_ashard::<D>(...)
rather than recomputing via load_epoch; update BoundaryWork usage only to
consume the already-stored range if needed.
```

</details>

</blockquote></details>
<details>
<summary>crates/cardano/src/work.rs (2)</summary><blockquote>

`276-279`: **Test `TEST_TOTAL_SHARDS = 1` doesn't cover multi-shard sequencing in this state machine.**

These tests collapse `AShard` into the `EWrap` tag and set `TEST_TOTAL_SHARDS = 1`, so the multi-shard loop in `pop_work` (lines 226–245) — including the `next_index >= total_shards` transition — is exercised only with `total_shards = 1`. Worth adding at least one test with `TEST_TOTAL_SHARDS > 1` that asserts the work buffer emits exactly N `AShard` units before transitioning to `EwrapBoundary` (and that `last_point_seen` / cursor invariants hold across that transition). This catches off-by-one and ordering regressions in the boundary state machine that the current tests would silently miss.



Want me to draft this test?


Also applies to: 325-330

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/work.rs` around lines 276 - 279, Add a new unit test that
sets TEST_TOTAL_SHARDS > 1 (instead of the current const TEST_TOTAL_SHARDS = 1)
to exercise the multi-shard sequencing in the pop_work logic: drive the
WorkBuffer/state machine (invoking pop_work) and assert that it emits exactly N
consecutive AShard units (N == TEST_TOTAL_SHARDS) before producing an
EwrapBoundary (or EWrap/Ewrap as used in this code), and verify that
last_point_seen and any cursor invariants are preserved across the transition;
target the test to exercise the pop_work path that contains the next_index >=
total_shards branch and use the existing helpers/setup used by the other tests
so it integrates with the same state machine and coverage.
```

</details>

---

`120-143`: **`account_shards == 0` "skip-shards" fallback contradicts `validate_total_shards`.**

`crates/cardano/src/ashard/shard.rs::validate_total_shards` rejects `0` outright ("account_shards must be >= 1"), yet both `on_ewrap_boundary` and the `PreEwrapBoundary` arm of `pop_work` treat `0` as "no shards — go straight to Ewrap". If a config bug or missing validation ever hands `0` here, you'll silently skip per-account boundary processing instead of failing fast. The defensive comment ("Shouldn't happen with a valid config") tacitly acknowledges this.

Either drop the branches and let an `unreachable!()` / `assert!(account_shards >= 1)` fail loudly, or change `validate_total_shards` to accept `0` as "no shards" if that's the intended semantics. Right now the two pieces disagree.




Also applies to: 210-225

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/work.rs` around lines 120 - 143, The code currently treats
account_shards == 0 as a valid "no shards" path in on_ewrap_boundary (and
similarly in the PreEwrapBoundary arm of pop_work), which contradicts
validate_total_shards that requires account_shards >= 1; change the behavior to
fail-fast: remove the special-case branch that returns EwrapBoundary when
account_shards == 0 and instead assert or panic if account_shards < 1 (e.g., add
assert!(account_shards >= 1) at the start of on_ewrap_boundary), keeping the
normal Restart -> AShardingBoundary path for valid shard counts; alternatively,
if the intended semantics are that 0 means "no shards", update
validate_total_shards to accept 0 and document that invariant—pick one
consistent approach and apply the same fix in the PreEwrapBoundary/pop_work code
paths so both agree with validate_total_shards.
```

</details>

</blockquote></details>
<details>
<summary>crates/cardano/src/ashard/shard.rs (1)</summary><blockquote>

`37-60`: **`debug_assert!` makes invalid `total_shards` a runtime div-by-zero in release.**

The contract is documented and there's a `validate_total_shards` helper, but the only enforcement inside `shard_key_range` is `debug_assert!`. In release builds, `total_shards == 0` reaches `let step = PREFIX_SPACE / total_shards;` and panics with a less informative message; non-divisor values silently produce truncated `step` arithmetic.

Since this is a public function and the cost of a real assertion (or returning `Result`) is negligible compared to the I/O the caller does, prefer `assert!` (or propagate the error from `validate_total_shards`).



<details>
<summary>🛡️ Proposed change</summary>

```diff
 pub fn shard_key_range(shard_index: u32, total_shards: u32) -> Range<EntityKey> {
-    debug_assert!(validate_total_shards(total_shards).is_ok());
-    debug_assert!(shard_index < total_shards);
+    assert!(
+        validate_total_shards(total_shards).is_ok(),
+        "invalid total_shards: {total_shards}"
+    );
+    assert!(shard_index < total_shards, "shard_index {shard_index} >= total_shards {total_shards}");
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/ashard/shard.rs` around lines 37 - 60, The function
shard_key_range uses debug_assert! to check validate_total_shards(total_shards)
and shard_index < total_shards, which leaves a division-by-zero and silent
truncation in release; change the debug_assert! checks to real runtime checks
(e.g., assert!(validate_total_shards(total_shards).is_ok()) and
assert!(shard_index < total_shards)) or convert the function to return Result
and propagate validate_total_shards() error from shard_key_range so PREFIX_SPACE
/ total_shards cannot divide by zero and invalid total_shards are handled
deterministically; update callers if you choose the Result approach and keep
references to shard_key_range, validate_total_shards, and PREFIX_SPACE when
making the change.
```

</details>

</blockquote></details>
<details>
<summary>crates/cardano/src/ewrap/loading.rs (1)</summary><blockquote>

`280-285`: **`shard_applied_*` fields are dead state on Ewrap-constructed `BoundaryWork`.**

`new_empty` zero-initializes `shard_applied_effective`, `shard_applied_unspendable_to_treasury`, and `shard_applied_unspendable_to_reserves`, but Ewrap never reads/writes them — the per-shard accumulators land in `EpochState.end` via `EpochEndAccumulate` and are read from `ending_state.end` by `wrapup.flush`. The fields exist only because `BoundaryWork` is shared with the AShard path.

Not a bug, but the dual-use API obscures intent. If a future change adds a third caller, it's easy to misuse these fields. Consider either splitting `BoundaryWork` into `AShardWork`/`EwrapWork` or documenting on the struct which fields each phase owns.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/ewrap/loading.rs` around lines 280 - 285, BoundaryWork's
shard_applied_effective / shard_applied_unspendable_to_treasury /
shard_applied_unspendable_to_reserves are dead for the Ewrap path (zeroed in
new_empty and never read by Ewrap; real per-shard accumulators live in
EpochState.end via EpochEndAccumulate and are consumed by wrapup.flush), so fix
by either splitting the dual-purpose struct into two clear types (e.g.,
AShardWork and EwrapWork) and move the AShard-only shard_applied_* fields into
AShardWork, or add explicit documentation/comments on BoundaryWork and those
three fields noting they are owned/used only by the AShard path; update
new_empty, any constructors, and usages of BoundaryWork (including references in
EpochEndAccumulate and wrapup.flush) to use the new types or to reflect the
documented ownership to prevent future misuse.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @crates/cardano/src/model/epochs.rs:

Around line 737-740: The apply path currently does
entity.end.as_mut().expect("ESTART seeded EpochState.end before shards run")
which panics if EpochState.end is None; instead ensure end is initialized to
EndStats::default() when missing (so subsequent EpochWrapUp can overwrite it).
Update the apply code in crates/cardano/src/model/epochs.rs to replace the
expect with a safe initialization (e.g., use get_or_insert_with / get_or_insert
or similar) so entity.end becomes Some(EndStats::default()) if it was None
before mutating it.

Around line 684-771: EpochEndAccumulate::undo currently assumes apply always
mutated state, causing underflow/corruption when apply returned early; fix by
recording in apply whether it performed mutations and the prior values needed to
revert (capture previous end.effective_rewards, end.unspendable_to_treasury,
end.unspendable_to_reserves and previous entity.ashard_progress/AShardProgress)
into fields on EpochEndAccumulate (or a was_applied flag plus prev_* fields),
then make undo a no-op if apply was skipped or restore those captured prev_*
values and previous ashard_progress instead of unconditionally subtracting and
setting committed/total; update apply (function EpochEndAccumulate::apply) to
populate these fields only when it actually mutates state and add a proptest
exercising apply→undo when ashard_progress is already advanced past
completed_shard_index to lock the behavior.

In @crates/core/src/config.rs:

Around line 988-1002: Add runtime validation to reject invalid account_shards
values by calling the existing validate_total_shards() from
crates/cardano/src/ashard/shard.rs during config initialization: when
constructing/using CardanoConfig (e.g., inside CardanoLogic::initialize() before
any use of CardanoConfig::account_shards() or DEFAULT_ACCOUNT_SHARDS), call
validate_total_shards(self.account_shards.unwrap_or(Self::DEFAULT_ACCOUNT_SHARDS))
and return/propagate an error if it fails; ensure the initialization path stops
the startup with a clear error rather than allowing invalid values (0 or
non-divisors of 256) to be used.

In @skills/debug-epoch-mismatch/SKILL.md:

Line 138: The documentation still treats reward application as a single EWRAP
work unit but the code split moves per-account reward application into AShard
while Ewrap now handles MIR/global closure; update the tables that list or
reference EWRAP (notably the "Classify the Mismatch", "Dolos Work Units →
Haskell Concepts", and the instrumentation hypothesis table) to distinguish
AShard vs Ewrap: point applyRUpd + reward filtering, unspendable routing and
registration filtering to AShard and point MIR and boundary close to Ewrap, and
adjust any bisection/debugging guidance so it routes to AShard for per-account
reward issues and to Ewrap for MIR/global closure issues. Ensure references
include the file/class names AShard, Ewrap and the function/concept names
applyRUpd, MIR, reward filtering, boundary close to make locating code
straightforward.

Outside diff comments:
In @crates/cardano/src/ashard/rewards.rs:

Around line 114-188: The per-shard logs in flush don't indicate which shard
they belong to; update the flush logging to include shard context (shard_index
and total_shards). Add shard_index: usize and total_shards: usize as fields on
the AShard struct (or otherwise make them available on self or ctx), ensure any
AShard construction/population passes these values, and then include
%shard_index and %total_shards in the tracing::debug! and tracing::error!
invocations (and the debug! at the end) around ctx.rewards/ drain_unspendable/
applied_reward_credentials so all per-shard telemetry shows shard scope.

In @crates/cardano/src/model/epochs.rs:

Around line 980-981: The rustdoc warning comes from using a /// doc comment
immediately above the prop_compose! macro (related to EpochStatsUpdate::apply
and rolling.live_mut which asserts rolling.next is None), so replace that
leading /// comment with a non-doc comment (//) or move the explanation into the
generator body/closure so rustdoc won't try to document the macro invocation;
specifically update the comment above the prop_compose! invocation that
references rolling.live_mut and rolling.next to use // (or relocate it inside
the prop_compose! closure) to silence the unused_doc_comments lint.

Nitpick comments:
In @crates/cardano/src/ashard/loading.rs:

Around line 19-57: The function load_pending_rewards_range currently accepts
Option<Range> but the only caller (load_ashard) always passes
Some(range), so tighten the signature to take Range directly: change
the parameter type in load_pending_rewards_range and update callers (e.g., in
load_ashard remove Some(range.clone()) and pass range.clone() directly); ensure
the body drops any pattern-matching for None (no-op) and uses the range value
when calling
state.iter_entities_typed::(PendingRewardState::NS, range).

In @crates/cardano/src/ashard/rewards.rs:

Around line 56-77: Update the stale comment referencing "EWRAP" to reflect the
new pipeline stage: change the text around the account deregistration
explanation to say "deregistered before AShard" (or "before the boundary's
per-account leg") so it matches where the per-account registration check now
runs; the relevant area to edit is the comment above the account.is_registered()
branch in the function that calls reward.total_value() and
ctx.rewards.return_reward_to_treasury(total). Ensure the updated comment keeps
the existing clarification about pre-filtered accounts and returned_rewards
behavior.

In @crates/cardano/src/ashard/shard.rs:

Around line 37-60: The function shard_key_range uses debug_assert! to check
validate_total_shards(total_shards) and shard_index < total_shards, which leaves
a division-by-zero and silent truncation in release; change the debug_assert!
checks to real runtime checks (e.g.,
assert!(validate_total_shards(total_shards).is_ok()) and assert!(shard_index <
total_shards)) or convert the function to return Result and propagate
validate_total_shards() error from shard_key_range so PREFIX_SPACE /
total_shards cannot divide by zero and invalid total_shards are handled
deterministically; update callers if you choose the Result approach and keep
references to shard_key_range, validate_total_shards, and PREFIX_SPACE when
making the change.

In @crates/cardano/src/ashard/work_unit.rs:

Around line 62-129: The load/commit_state duplication and drift risk can be
fixed by capturing the computed total_shards and range during load and reusing
them in commit_state instead of re-reading state: after computing total_shards
and range in load (before calling BoundaryWork::load_ashard), store the values
on the work unit (e.g., add fields self.total_shards and/or self.range and
assign them there), and then change commit_state to take the boundary via
self.boundary.as_mut() and use the stored range/total_shards when calling
boundary.commit_ashard::(...) rather than recomputing via load_epoch; update
BoundaryWork usage only to consume the already-stored range if needed.

In @crates/cardano/src/ewrap/commit.rs:

Around line 91-100: The current call to stream_and_apply_namespace::<D,
AccountState>(state, &writer, None) performs a full-table scan of AccountState
each boundary; instead implement a targeted path that iterates over
self.applied_mir_credentials (or the credentials from queued AssignRewards
deltas), point-reads each AccountState, applies the MIR delta, and writes the
result via the same writer so only affected accounts are touched; keep
stream_and_apply_namespace for full-set namespaces like
Pool/DRep/Proposal/EpochState, and ensure the new code reuses the existing
apply/serialize logic used by stream_and_apply_namespace to maintain invariants
and error handling.

In @crates/cardano/src/ewrap/loading.rs:

Around line 280-285: BoundaryWork's shard_applied_effective /
shard_applied_unspendable_to_treasury / shard_applied_unspendable_to_reserves
are dead for the Ewrap path (zeroed in new_empty and never read by Ewrap; real
per-shard accumulators live in EpochState.end via EpochEndAccumulate and are
consumed by wrapup.flush), so fix by either splitting the dual-purpose struct
into two clear types (e.g., AShardWork and EwrapWork) and move the AShard-only
shard_applied_* fields into AShardWork, or add explicit documentation/comments
on BoundaryWork and those three fields noting they are owned/used only by the
AShard path; update new_empty, any constructors, and usages of BoundaryWork
(including references in EpochEndAccumulate and wrapup.flush) to use the new
types or to reflect the documented ownership to prevent future misuse.

In @crates/cardano/src/lib.rs:

Around line 284-341: Extract and reuse the epoch load and the
effective-account-shards logic: in initialize(), call load_epoch::(state)
once and store the resulting EpochState (if Ok) for the warnings; replace the
inline computation of effective_account_shards with a call to the existing
helper read_effective_account_shards (or lift that helper to a free fn if
necessary) so the "persisted total else config" rule is implemented in one
place; update references to use the single loaded epoch value for the
tracing::warn! blocks and derive effective_account_shards from
read_effective_account_shards(state, &config) (or equivalent) to remove the
duplicate load_epoch::(state) call.

Around line 404-408: The code refreshes self.effective_account_shards on every
pop_work by calling read_effective_account_shards::(domain.state()), but this
value only matters when the epoch boundary handoff
(EpochState.ashard_progress.total) can change; avoid the repeated read by first
checking whether ashard_progress has changed and only calling
read_effective_account_shards when it has. Add a small cached marker on the
struct (e.g., last_ashard_total or last_ashard_progress) and in pop_work compare
domain.state().ashard_progress.total (or the appropriate accessor) to that
cached value; if different, call read_effective_account_shards and update both
effective_account_shards and the cached marker, otherwise skip the read and keep
the existing effective_account_shards.

In @crates/cardano/src/work.rs:

Around line 276-279: Add a new unit test that sets TEST_TOTAL_SHARDS > 1
(instead of the current const TEST_TOTAL_SHARDS = 1) to exercise the multi-shard
sequencing in the pop_work logic: drive the WorkBuffer/state machine (invoking
pop_work) and assert that it emits exactly N consecutive AShard units (N ==
TEST_TOTAL_SHARDS) before producing an EwrapBoundary (or EWrap/Ewrap as used in
this code), and verify that last_point_seen and any cursor invariants are
preserved across the transition; target the test to exercise the pop_work path
that contains the next_index >= total_shards branch and use the existing
helpers/setup used by the other tests so it integrates with the same state
machine and coverage.

Around line 120-143: The code currently treats account_shards == 0 as a valid
"no shards" path in on_ewrap_boundary (and similarly in the PreEwrapBoundary arm
of pop_work), which contradicts validate_total_shards that requires
account_shards >= 1; change the behavior to fail-fast: remove the special-case
branch that returns EwrapBoundary when account_shards == 0 and instead assert or
panic if account_shards < 1 (e.g., add assert!(account_shards >= 1) at the start
of on_ewrap_boundary), keeping the normal Restart -> AShardingBoundary path for
valid shard counts; alternatively, if the intended semantics are that 0 means
"no shards", update validate_total_shards to accept 0 and document that
invariant—pick one consistent approach and apply the same fix in the
PreEwrapBoundary/pop_work code paths so both agree with validate_total_shards.

In @crates/cardano/work_units.md:

Around line 5-11: The fenced code block containing the epoch diagram starting
with "Estart → Roll … → Rupd → Roll … → AShard ×N → Ewrap" is missing
a language tag (MD040); fix it by adding a language identifier (use "text")
immediately after the opening triple backticks so the block becomes ```text and
the diagram passes markdownlint-cli2 validation.

In @tests/epoch_pots/main.rs:

Around line 500-513: Collapse the two separate if let Some(boundary) checks
into a single binding for boundary when matching
CardanoWorkUnit::AShard(shard): inside one if let Some(boundary) = shard.boundary() block first check if shard.shard_index() == 0 to clear
accumulated_applied and set accumulated_ending_epoch = Some(boundary.ending_state().number), then unconditionally call
accumulated_applied.extend(boundary.applied_rewards.iter().cloned()); this
removes the duplicated boundary lookup and makes the reset-and-extend logic
contiguous.
</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ]  Push a commit to this branch (recommended)
- [ ]  Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `13b48d06-fb3d-459b-b0a0-59ac2b808397`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 885094deed721abaa8bca14087703bd7ea409c28 and 8174d2a832bd81baa5228d8b3059b0b2e0bfd7ba.

</details>

<details>
<summary>📒 Files selected for processing (23)</summary>

* `crates/cardano/src/ashard/commit.rs`
* `crates/cardano/src/ashard/loading.rs`
* `crates/cardano/src/ashard/mod.rs`
* `crates/cardano/src/ashard/rewards.rs`
* `crates/cardano/src/ashard/shard.rs`
* `crates/cardano/src/ashard/work_unit.rs`
* `crates/cardano/src/ewrap/commit.rs`
* `crates/cardano/src/ewrap/loading.rs`
* `crates/cardano/src/ewrap/mod.rs`
* `crates/cardano/src/ewrap/work_unit.rs`
* `crates/cardano/src/ewrap/wrapup.rs`
* `crates/cardano/src/genesis/mod.rs`
* `crates/cardano/src/lib.rs`
* `crates/cardano/src/model/epochs.rs`
* `crates/cardano/src/model/mod.rs`
* `crates/cardano/src/model/pending.rs`
* `crates/cardano/src/rupd/work_unit.rs`
* `crates/cardano/src/work.rs`
* `crates/cardano/work_units.md`
* `crates/core/src/config.rs`
* `skills/debug-epoch-mismatch/SKILL.md`
* `tests/epoch_pots/main.rs`
* `tests/memory.rs`

</details>

</details>

coderabbitai · 2026-04-26T13:57:54Z

+        let end = entity
+            .end
+            .as_mut()
+            .expect("ESTART seeded EpochState.end before shards run");


⚠️ Potential issue | 🟡 Minor

apply panics if EpochState.end is None — fragile coupling to Genesis seeding.

The expect("ESTART seeded EpochState.end before shards run") relies on Genesis seeding end = Some(EndStats::default()) for the first boundary, and on EpochTransition re-seeding it on every subsequent transition. This is correct on the happy path but turns any future code path that constructs an EpochState with end = None (test fixtures, migration tools, recovery utilities, a genesis bootstrap regression) into a hard panic deep inside delta application.

Cheaper to defend: initialize end to EndStats::default() if absent rather than panic. The EpochWrapUp delta later overwrites it with the final stats, so semantics are preserved.

🛡️ Suggested change

- let end = entity - .end - .as_mut() - .expect("ESTART seeded EpochState.end before shards run"); + let end = entity.end.get_or_insert_with(EndStats::default);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let end = entity

.end

.as_mut()

.expect("ESTART seeded EpochState.end before shards run");

let end = entity.end.get_or_insert_with(EndStats::default);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/model/epochs.rs` around lines 737 - 740, The apply path currently does entity.end.as_mut().expect("ESTART seeded EpochState.end before shards run") which panics if EpochState.end is None; instead ensure end is initialized to EndStats::default() when missing (so subsequent EpochWrapUp can overwrite it). Update the apply code in crates/cardano/src/model/epochs.rs to replace the expect with a safe initialization (e.g., use get_or_insert_with / get_or_insert or similar) so entity.end becomes Some(EndStats::default()) if it was None before mutating it.

apply has three early-return guards (idempotent repeat, out-of-order, total_shards mismatch) that leave state untouched. undo unconditionally subtracted the deltas and overwrote ashard_progress, so a rollback following a skipped apply would underflow the u64 end.* fields and clobber the cursor. Capture prev_ashard_progress and set an applied flag during apply only when state is actually mutated; undo early-returns when !applied and restores from the snapshot. Same pattern as EpochWrapUp/EpochStatsUpdate. Also broaden any_epoch_state to vary ashard_progress so the existing roundtrip proptests for EpochWrapUp and EpochTransition exercise the Some(_) → None → Some(_) path their apply/undo introduced earlier in this branch (previously only None → None was covered). Add a dedicated epoch_end_accumulate_roundtrip proptest covering all four progress shapes and all three skip branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The divides-256 invariant on account_shards is enforced only via debug_assert! in shard_key_range(), which is stripped in release builds. An invalid TOML value (0, 3, 7, 100, ...) would deserialize cleanly and silently corrupt key-range coverage. Call validate_total_shards() at the top of CardanoLogic::initialize and surface failures as ChainError::InvalidConfig so misconfiguration fails the startup with a clear message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After this branch's restructure, per-account reward application, unspendable routing, and EWRAP-time registration filtering live in AShard; only MIRs, refunds, and boundary close remain in Ewrap. Update the Classify, Work Units, Source Files, and Instrumentation tables so the bisection workflow points at the right work unit (and module path) for each failure shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

crates/cardano/src/lib.rs (1)

250-258: Minor: deduplicate effective-shard-count computation.

The body of read_effective_account_shards (lines 253-258) is identical to the inline block at lines 345-348 in initialize. Since Self isn't yet constructed there, the method form can't be reused, but extracting a free helper (or Self::compute_effective_account_shards<D>(state, &config)) would keep both call sites in sync if the fallback rule ever changes.

♻️ Suggested refactor

-    /// Compute the effective `account_shards` value: stored
-    /// `ashard_progress.total` if a boundary is in flight, otherwise the
-    /// configured value.
-    fn read_effective_account_shards<D: Domain>(&self, state: &D::State) -> u32 {
-        load_epoch::<D>(state)
-            .ok()
-            .and_then(|e| e.ashard_progress.as_ref().map(|p| p.total))
-            .unwrap_or_else(|| self.config.account_shards())
-    }
+    /// Compute the effective `account_shards` value: stored
+    /// `ashard_progress.total` if a boundary is in flight, otherwise the
+    /// configured value.
+    fn compute_effective_account_shards<D: Domain>(state: &D::State, config: &CardanoConfig) -> u32 {
+        load_epoch::<D>(state)
+            .ok()
+            .and_then(|e| e.ashard_progress.as_ref().map(|p| p.total))
+            .unwrap_or_else(|| config.account_shards())
+    }
+
+    fn read_effective_account_shards<D: Domain>(&self, state: &D::State) -> u32 {
+        Self::compute_effective_account_shards::<D>(state, &self.config)
+    }

Then at line 345-348:

-        let effective_account_shards = load_epoch::<D>(state)
-            .ok()
-            .and_then(|e| e.ashard_progress.as_ref().map(|p| p.total))
-            .unwrap_or_else(|| config.account_shards());
+        let effective_account_shards =
+            Self::compute_effective_account_shards::<D>(state, &config);

Also applies to: 343-348

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/cardano/src/lib.rs` around lines 250 - 258, The effective-shard-count
computation is duplicated between the method read_effective_account_shards and
the inline block in initialize; extract the logic into a single helper (either a
free function like compute_effective_account_shards::<D>(state, &config) or an
associated fn Self::compute_effective_account_shards::<D>(state, &config)) that
returns the u32 by loading the epoch and falling back to
config.account_shards(), then replace both the body of
read_effective_account_shards and the inline compute in initialize to call that
helper so both call sites stay in sync if the fallback changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/cardano/src/lib.rs`:
- Around line 250-258: The effective-shard-count computation is duplicated
between the method read_effective_account_shards and the inline block in
initialize; extract the logic into a single helper (either a free function like
compute_effective_account_shards::<D>(state, &config) or an associated fn
Self::compute_effective_account_shards::<D>(state, &config)) that returns the
u32 by loading the epoch and falling back to config.account_shards(), then
replace both the body of read_effective_account_shards and the inline compute in
initialize to call that helper so both call sites stay in sync if the fallback
changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3543b5af-c9cb-43d0-942e-a2c733a526c3

📥 Commits

Reviewing files that changed from the base of the PR and between 8174d2a and aa835bb.

📒 Files selected for processing (4)

crates/cardano/src/lib.rs
crates/cardano/src/model/epochs.rs
crates/core/src/lib.rs
skills/debug-epoch-mismatch/SKILL.md

✅ Files skipped from review due to trivial changes (1)

crates/core/src/lib.rs

🚧 Files skipped from review as they are similar to previous changes (1)

skills/debug-epoch-mismatch/SKILL.md

scarmuega and others added 6 commits April 24, 2026 17:40

scarmuega and others added 6 commits April 25, 2026 11:51

scarmuega marked this pull request as ready for review April 26, 2026 13:46

coderabbitai Bot reviewed Apr 26, 2026

View reviewed changes

scarmuega and others added 3 commits April 26, 2026 12:29

coderabbitai Bot reviewed Apr 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(cardano): shard, reorder, and merge the EWRAP boundary pipeline#978

refactor(cardano): shard, reorder, and merge the EWRAP boundary pipeline#978
scarmuega wants to merge 15 commits intomainfrom
feat/shard-ewrap-work-units

scarmuega commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 26, 2026

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

scarmuega commented Apr 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

scarmuega commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading